Efficient String Matching and Easy Bottom-Up Parsing

نویسنده

  • Ján Šturc
چکیده

The paper consists of the two parts. In the first one we compare three string matching algorithms: Dömölki algorithm, also known as SHIFT OR algorithm, O(n×m), Table driven O((n +m) × lgm) and Aho Corasick O(m + n). Table driven algorithm pass trough the same states as Dömölki one but have compact states encoding. The table driven algorithm can be also considered as Aho Corasick algorithm with eliminated ǫ-transitions. We advocate that the table driven algorithm is the best solution for matching multiple patterns of reasonable size. The second part of the paper deal with bottom-up syntax analysis. We have shown that the backward deterministic syntax analysis can be implemented via extension of a string matching automaton by a stack; or two stacks, if we want to go beyond context free grammars. The implementation of a parser this type is as easy as writing a recursive descent parser; we need to supply only the transition table, which can be easily derived from the grammar. Finally, we discuss some compiler engineering details.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bottom-Up Parsing Extending Context-Freeness in a Process Grammar Processor

A new approach to bottom-up parsing that extends Augmented Context-Free Grammar to a Process Grammar is formally presented. A Process Grammar (PG) defines a set of rules suited for bottom-up parsing and conceived as processes that are applied by a P G Processor. The matching phase is a crucial step for process application, and a parsing structure for efficient matching is also presented. The PG...

متن کامل

A Parsing Algorithm for Unification Grammar

We describe a table-driven parser for unification grammar that combines bottom-up construction of phrases with top-down filtering. This algorithm works on a class of grammars called depth-bounded grammars, and it is guaranteed to halt for any input string. Unlike many unification parsers, our algorithm works directly on a unification grammar--it does not require that we divide the grammar into ...

متن کامل

Efficient Retargetable Code Generation Using Bottom-up Tree Pattern Matching

Instruction selection is the primary task in automatic code generation. This paper proposes a practical system for performing optimal instruction selection based on tree pattern matching for expression trees. A significant feature of the system is its ability to perform code generation without requiring cost analysis at code generation time. The target machine instructions are specified as attr...

متن کامل

A Bottom-up Parser where Entire Operation is Conducted in the Letter String Region

This paper treats a natural language parser of bottom-up type. The characteristics of the parser lies in that the data treated keep the shape of letter string through the entire parsing operations. Letter strings including parentheses express the partial trees generated in the course of parsing. This expression helps to avoid list expression usually used to represent trees. Key-Words: parser, b...

متن کامل

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007